Dataset statistics
| Number of variables | 12 |
|---|---|
| Number of observations | 2935849 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 6 |
| Duplicate rows (%) | < 0.1% |
| Total size in memory | 235.2 MiB |
| Average record size in memory | 84.0 B |
Variable types
| NUM | 10 |
|---|---|
| DATE | 1 |
| CAT | 1 |
| Dataset has 6 (< 0.1%) duplicate rows | Duplicates |
year is highly correlated with date_block_num | High correlation |
date_block_num is highly correlated with year | High correlation |
yrday is highly correlated with month | High correlation |
month is highly correlated with yrday | High correlation |
item_cnt_day is highly skewed (γ1 = 272.8331617) | Skewed |
date_block_num has 115690 (3.9%) zeros | Zeros |
weekday has 337074 (11.5%) zeros | Zeros |
Reproduction
| Analysis started | 2020-09-24 20:50:10.637737 |
|---|---|
| Analysis finished | 2020-09-24 20:57:58.824405 |
| Duration | 7 minutes and 48.19 seconds |
| Software version | pandas-profiling v2.9.0 |
| Download configuration | config.yaml |
date
Date
| Distinct | 1034 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 22.4 MiB |
| Minimum | 2013-01-01 00:00:00 |
|---|---|
| Maximum | 2015-10-31 00:00:00 |
Histogram with fixed size bins (bins=50)
| Distinct | 34 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 14.56991146 |
|---|---|
| Minimum | 0 |
| Maximum | 33 |
| Zeros | 115690 |
| Zeros (%) | 3.9% |
| Memory size | 22.4 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 7 |
| median | 14 |
| Q3 | 23 |
| 95-th percentile | 31 |
| Maximum | 33 |
| Range | 33 |
| Interquartile range (IQR) | 16 |
Descriptive statistics
| Standard deviation | 9.422987709 |
|---|---|
| Coefficient of variation (CV) | 0.6467429629 |
| Kurtosis | -1.082868996 |
| Mean | 14.56991146 |
| Median Absolute Deviation (MAD) | 8 |
| Skewness | 0.2038579466 |
| Sum | 42775060 |
| Variance | 88.79269736 |
| Monotocity | Increasing |
Histogram with fixed size bins (bins=34)
| Value | Count | Frequency (%) | |
| 11 | 143246 | 4.9% | |
| 23 | 130786 | 4.5% | |
| 2 | 121347 | 4.1% | |
| 0 | 115690 | 3.9% | |
| 1 | 108613 | 3.7% | |
| 7 | 104772 | 3.6% | |
| 6 | 100548 | 3.4% | |
| 5 | 100403 | 3.4% | |
| 12 | 99349 | 3.4% | |
| 10 | 96736 | 3.3% | |
| Other values (24) | 1814359 | 61.8% |
| Value | Count | Frequency (%) | |
| 0 | 115690 | 3.9% | |
| 1 | 108613 | 3.7% | |
| 2 | 121347 | 4.1% | |
| 3 | 94109 | 3.2% | |
| 4 | 91759 | 3.1% |
| Value | Count | Frequency (%) | |
| 33 | 53514 | 1.8% | |
| 32 | 50588 | 1.7% | |
| 31 | 57029 | 1.9% | |
| 30 | 55549 | 1.9% | |
| 29 | 54617 | 1.9% |
shop_id
Real number (ℝ≥0)
| Distinct | 60 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 33.00172829 |
|---|---|
| Minimum | 0 |
| Maximum | 59 |
| Zeros | 9857 |
| Zeros (%) | 0.3% |
| Memory size | 22.4 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 6 |
| Q1 | 22 |
| median | 31 |
| Q3 | 47 |
| 95-th percentile | 57 |
| Maximum | 59 |
| Range | 59 |
| Interquartile range (IQR) | 25 |
Descriptive statistics
| Standard deviation | 16.22697305 |
|---|---|
| Coefficient of variation (CV) | 0.4917007044 |
| Kurtosis | -1.025358056 |
| Mean | 33.00172829 |
| Median Absolute Deviation (MAD) | 13 |
| Skewness | -0.07236142921 |
| Sum | 96888091 |
| Variance | 263.3146543 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 31 | 235636 | 8.0% | |
| 25 | 186104 | 6.3% | |
| 54 | 143480 | 4.9% | |
| 28 | 142234 | 4.8% | |
| 57 | 117428 | 4.0% | |
| 42 | 109253 | 3.7% | |
| 27 | 105366 | 3.6% | |
| 6 | 82663 | 2.8% | |
| 58 | 71441 | 2.4% | |
| 56 | 69573 | 2.4% | |
| Other values (50) | 1672671 | 57.0% |
| Value | Count | Frequency (%) | |
| 0 | 9857 | 0.3% | |
| 1 | 5678 | 0.2% | |
| 2 | 25991 | 0.9% | |
| 3 | 25532 | 0.9% | |
| 4 | 38242 | 1.3% |
| Value | Count | Frequency (%) | |
| 59 | 42108 | 1.4% | |
| 58 | 71441 | 2.4% | |
| 57 | 117428 | 4.0% | |
| 56 | 69573 | 2.4% | |
| 55 | 34769 | 1.2% |
item_id
Real number (ℝ≥0)
| Distinct | 21807 |
|---|---|
| Distinct (%) | 0.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 10197.22706 |
|---|---|
| Minimum | 0 |
| Maximum | 22169 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Memory size | 22.4 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1540 |
| Q1 | 4476 |
| median | 9343 |
| Q3 | 15684 |
| 95-th percentile | 20949 |
| Maximum | 22169 |
| Range | 22169 |
| Interquartile range (IQR) | 11208 |
Descriptive statistics
| Standard deviation | 6324.297354 |
|---|---|
| Coefficient of variation (CV) | 0.6201977575 |
| Kurtosis | -1.225209966 |
| Mean | 10197.22706 |
| Median Absolute Deviation (MAD) | 5492 |
| Skewness | 0.2571735482 |
| Sum | 2.993751886e+10 |
| Variance | 39996737.02 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 20949 | 31340 | 1.1% | |
| 5822 | 9408 | 0.3% | |
| 17717 | 9067 | 0.3% | |
| 2808 | 7479 | 0.3% | |
| 4181 | 6853 | 0.2% | |
| 7856 | 6602 | 0.2% | |
| 3732 | 6475 | 0.2% | |
| 2308 | 6320 | 0.2% | |
| 4870 | 5811 | 0.2% | |
| 3734 | 5805 | 0.2% | |
| Other values (21797) | 2840689 | 96.8% |
| Value | Count | Frequency (%) | |
| 0 | 1 | < 0.1% | |
| 1 | 6 | < 0.1% | |
| 2 | 2 | < 0.1% | |
| 3 | 2 | < 0.1% | |
| 4 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 22169 | 1 | < 0.1% | |
| 22168 | 6 | < 0.1% | |
| 22167 | 1114 | < 0.1% | |
| 22166 | 270 | < 0.1% | |
| 22165 | 2 | < 0.1% |
item_price
Real number (ℝ)
| Distinct | 19993 |
|---|---|
| Distinct (%) | 0.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 890.8532327 |
|---|---|
| Minimum | -1 |
| Maximum | 307980 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 22.4 MiB |
Quantile statistics
| Minimum | -1 |
|---|---|
| 5-th percentile | 99 |
| Q1 | 249 |
| median | 399 |
| Q3 | 999 |
| 95-th percentile | 2690 |
| Maximum | 307980 |
| Range | 307981 |
| Interquartile range (IQR) | 750 |
Descriptive statistics
| Standard deviation | 1729.799631 |
|---|---|
| Coefficient of variation (CV) | 1.941733573 |
| Kurtosis | 445.5328258 |
| Mean | 890.8532327 |
| Median Absolute Deviation (MAD) | 250 |
| Skewness | 10.7504227 |
| Sum | 2615410572 |
| Variance | 2992206.762 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 299 | 291352 | 9.9% | |
| 399 | 242603 | 8.3% | |
| 149 | 218432 | 7.4% | |
| 199 | 184044 | 6.3% | |
| 349 | 101461 | 3.5% | |
| 599 | 95673 | 3.3% | |
| 999 | 82784 | 2.8% | |
| 799 | 77882 | 2.7% | |
| 249 | 77685 | 2.6% | |
| 699 | 76493 | 2.6% | |
| Other values (19983) | 1487440 | 50.7% |
| Value | Count | Frequency (%) | |
| -1 | 1 | < 0.1% | |
| 0.07 | 2 | < 0.1% | |
| 0.0875 | 1 | < 0.1% | |
| 0.09 | 1 | < 0.1% | |
| 0.1 | 2932 | 0.1% |
| Value | Count | Frequency (%) | |
| 307980 | 1 | < 0.1% | |
| 59200 | 1 | < 0.1% | |
| 50999 | 1 | < 0.1% | |
| 49782 | 1 | < 0.1% | |
| 42990 | 4 | < 0.1% |
| Distinct | 198 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.242640885 |
|---|---|
| Minimum | -22 |
| Maximum | 2169 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 22.4 MiB |
Quantile statistics
| Minimum | -22 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 1 |
| 95-th percentile | 2 |
| Maximum | 2169 |
| Range | 2191 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 2.618834431 |
|---|---|
| Coefficient of variation (CV) | 2.107474864 |
| Kurtosis | 177478.0988 |
| Mean | 1.242640885 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 272.8331617 |
| Sum | 3648206 |
| Variance | 6.858293776 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 1 | 2629372 | 89.6% | |
| 2 | 194201 | 6.6% | |
| 3 | 47350 | 1.6% | |
| 4 | 19685 | 0.7% | |
| 5 | 10474 | 0.4% | |
| -1 | 7252 | 0.2% | |
| 6 | 6338 | 0.2% | |
| 7 | 4057 | 0.1% | |
| 8 | 2903 | 0.1% | |
| 9 | 2177 | 0.1% | |
| Other values (188) | 12040 | 0.4% |
| Value | Count | Frequency (%) | |
| -22 | 1 | < 0.1% | |
| -16 | 1 | < 0.1% | |
| -9 | 1 | < 0.1% | |
| -6 | 2 | < 0.1% | |
| -5 | 4 | < 0.1% |
| Value | Count | Frequency (%) | |
| 2169 | 1 | < 0.1% | |
| 1000 | 1 | < 0.1% | |
| 669 | 1 | < 0.1% | |
| 637 | 1 | < 0.1% | |
| 624 | 1 | < 0.1% |
day
Real number (ℝ≥0)
| Distinct | 31 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 15.85266715 |
|---|---|
| Minimum | 1 |
| Maximum | 31 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 11.2 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 8 |
| median | 16 |
| Q3 | 24 |
| 95-th percentile | 30 |
| Maximum | 31 |
| Range | 30 |
| Interquartile range (IQR) | 16 |
Descriptive statistics
| Standard deviation | 8.923482976 |
|---|---|
| Coefficient of variation (CV) | 0.5629010495 |
| Kurtosis | -1.222018961 |
| Mean | 15.85266715 |
| Median Absolute Deviation (MAD) | 8 |
| Skewness | -0.005873321383 |
| Sum | 46541037 |
| Variance | 79.62854842 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=31)
| Value | Count | Frequency (%) | |
| 2 | 103372 | 3.5% | |
| 7 | 102273 | 3.5% | |
| 22 | 101345 | 3.5% | |
| 23 | 101339 | 3.5% | |
| 8 | 100986 | 3.4% | |
| 21 | 100208 | 3.4% | |
| 28 | 99813 | 3.4% | |
| 3 | 99027 | 3.4% | |
| 27 | 98952 | 3.4% | |
| 6 | 98058 | 3.3% | |
| Other values (21) | 1930476 | 65.8% |
| Value | Count | Frequency (%) | |
| 1 | 94421 | 3.2% | |
| 2 | 103372 | 3.5% | |
| 3 | 99027 | 3.4% | |
| 4 | 94469 | 3.2% | |
| 5 | 95436 | 3.3% |
| Value | Count | Frequency (%) | |
| 31 | 67601 | 2.3% | |
| 30 | 97436 | 3.3% | |
| 29 | 90899 | 3.1% | |
| 28 | 99813 | 3.4% | |
| 27 | 98952 | 3.4% |
| Distinct | 12 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.247716759 |
|---|---|
| Minimum | 1 |
| Maximum | 12 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 11.2 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 3 |
| median | 6 |
| Q3 | 9 |
| 95-th percentile | 12 |
| Maximum | 12 |
| Range | 11 |
| Interquartile range (IQR) | 6 |
Descriptive statistics
| Standard deviation | 3.536219343 |
|---|---|
| Coefficient of variation (CV) | 0.5660018658 |
| Kurtosis | -1.236332881 |
| Mean | 6.247716759 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | 0.09620076191 |
| Sum | 18342353 |
| Variance | 12.50484724 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=12)
| Value | Count | Frequency (%) | |
| 1 | 303561 | 10.3% | |
| 3 | 284057 | 9.7% | |
| 12 | 274032 | 9.3% | |
| 2 | 270251 | 9.2% | |
| 8 | 248415 | 8.5% | |
| 6 | 237428 | 8.1% | |
| 7 | 234857 | 8.0% | |
| 4 | 228289 | 7.8% | |
| 10 | 227077 | 7.7% | |
| 5 | 224836 | 7.7% | |
| Other values (2) | 403046 | 13.7% |
| Value | Count | Frequency (%) | |
| 1 | 303561 | 10.3% | |
| 2 | 270251 | 9.2% | |
| 3 | 284057 | 9.7% | |
| 4 | 228289 | 7.8% | |
| 5 | 224836 | 7.7% |
| Value | Count | Frequency (%) | |
| 12 | 274032 | 9.3% | |
| 11 | 183164 | 6.2% | |
| 10 | 227077 | 7.7% | |
| 9 | 219882 | 7.5% | |
| 8 | 248415 | 8.5% |
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 11.2 MiB |
| 2013 | |
|---|---|
| 2014 | |
| 2015 |
| Value | Count | Frequency (%) | |
| 2013 | 1267562 | 43.2% | |
| 2014 | 1055861 | 36.0% | |
| 2015 | 612426 | 20.9% |
Frequencies of value counts
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Histogram of lengths of the category
Length
| Max length | 4 |
|---|---|
| Median length | 4 |
| Mean length | 4 |
| Min length | 4 |
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.365686382 |
|---|---|
| Minimum | 0 |
| Maximum | 6 |
| Zeros | 337074 |
| Zeros (%) | 11.5% |
| Memory size | 22.4 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 2 |
| median | 4 |
| Q3 | 5 |
| 95-th percentile | 6 |
| Maximum | 6 |
| Range | 6 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 1.996795044 |
|---|---|
| Coefficient of variation (CV) | 0.5932801863 |
| Kurtosis | -1.203917591 |
| Mean | 3.365686382 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | -0.2763606994 |
| Sum | 9881147 |
| Variance | 3.987190447 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) | |
| 5 | 590359 | 20.1% | |
| 6 | 503104 | 17.1% | |
| 4 | 439298 | 15.0% | |
| 3 | 367280 | 12.5% | |
| 2 | 352962 | 12.0% | |
| 1 | 345772 | 11.8% | |
| 0 | 337074 | 11.5% |
| Value | Count | Frequency (%) | |
| 0 | 337074 | 11.5% | |
| 1 | 345772 | 11.8% | |
| 2 | 352962 | 12.0% | |
| 3 | 367280 | 12.5% | |
| 4 | 439298 | 15.0% |
| Value | Count | Frequency (%) | |
| 6 | 503104 | 17.1% | |
| 5 | 590359 | 20.1% | |
| 4 | 439298 | 15.0% | |
| 3 | 367280 | 12.5% | |
| 2 | 352962 | 12.0% |
| Distinct | 365 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 173.6892047 |
|---|---|
| Minimum | 0 |
| Maximum | 364 |
| Zeros | 5194 |
| Zeros (%) | 0.2% |
| Memory size | 22.4 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 12 |
| Q1 | 75 |
| median | 170 |
| Q3 | 266 |
| 95-th percentile | 353 |
| Maximum | 364 |
| Range | 364 |
| Interquartile range (IQR) | 191 |
Descriptive statistics
| Standard deviation | 108.698348 |
|---|---|
| Coefficient of variation (CV) | 0.6258209781 |
| Kurtosis | -1.20685451 |
| Mean | 173.6892047 |
| Median Absolute Deviation (MAD) | 95 |
| Skewness | 0.1107832354 |
| Sum | 509925278 |
| Variance | 11815.33086 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 1 | 18652 | 0.6% | |
| 363 | 18462 | 0.6% | |
| 52 | 17723 | 0.6% | |
| 2 | 17120 | 0.6% | |
| 53 | 16818 | 0.6% | |
| 361 | 16804 | 0.6% | |
| 364 | 16112 | 0.5% | |
| 362 | 15840 | 0.5% | |
| 3 | 15588 | 0.5% | |
| 4 | 15057 | 0.5% | |
| Other values (355) | 2767673 | 94.3% |
| Value | Count | Frequency (%) | |
| 0 | 5194 | 0.2% | |
| 1 | 18652 | 0.6% | |
| 2 | 17120 | 0.6% | |
| 3 | 15588 | 0.5% | |
| 4 | 15057 | 0.5% |
| Value | Count | Frequency (%) | |
| 364 | 16112 | 0.5% | |
| 363 | 18462 | 0.6% | |
| 362 | 15840 | 0.5% | |
| 361 | 16804 | 0.6% | |
| 360 | 14252 | 0.5% |
item_category_id
Real number (ℝ≥0)
| Distinct | 84 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 40.0013829 |
|---|---|
| Minimum | 0 |
| Maximum | 83 |
| Zeros | 3 |
| Zeros (%) | < 0.1% |
| Memory size | 22.4 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 19 |
| Q1 | 28 |
| median | 40 |
| Q3 | 55 |
| 95-th percentile | 71 |
| Maximum | 83 |
| Range | 83 |
| Interquartile range (IQR) | 27 |
Descriptive statistics
| Standard deviation | 17.10075855 |
|---|---|
| Coefficient of variation (CV) | 0.4275041838 |
| Kurtosis | -0.5251578565 |
| Mean | 40.0013829 |
| Median Absolute Deviation (MAD) | 15 |
| Skewness | 0.3182825248 |
| Sum | 117438020 |
| Variance | 292.435943 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 40 | 564652 | 19.2% | |
| 30 | 351591 | 12.0% | |
| 55 | 339585 | 11.6% | |
| 19 | 208219 | 7.1% | |
| 37 | 192674 | 6.6% | |
| 23 | 146789 | 5.0% | |
| 28 | 121539 | 4.1% | |
| 20 | 79058 | 2.7% | |
| 63 | 53845 | 1.8% | |
| 65 | 53227 | 1.8% | |
| Other values (74) | 824670 | 28.1% |
| Value | Count | Frequency (%) | |
| 0 | 3 | < 0.1% | |
| 1 | 2 | < 0.1% | |
| 2 | 18461 | 0.6% | |
| 3 | 25283 | 0.9% | |
| 4 | 2304 | 0.1% |
| Value | Count | Frequency (%) | |
| 83 | 7206 | 0.2% | |
| 82 | 4390 | 0.1% | |
| 81 | 795 | < 0.1% | |
| 80 | 1325 | < 0.1% | |
| 79 | 9067 | 0.3% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
| date | date_block_num | shop_id | item_id | item_price | item_cnt_day | day | month | year | weekday | yrday | item_category_id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2013-01-02 | 0 | 59 | 22154 | 999.00 | 1.0 | 2 | 1 | 2013 | 2 | 1 | 37 |
| 1 | 2013-01-03 | 0 | 25 | 2552 | 899.00 | 1.0 | 3 | 1 | 2013 | 3 | 2 | 58 |
| 2 | 2013-01-05 | 0 | 25 | 2552 | 899.00 | -1.0 | 5 | 1 | 2013 | 5 | 4 | 58 |
| 3 | 2013-01-06 | 0 | 25 | 2554 | 1709.05 | 1.0 | 6 | 1 | 2013 | 6 | 5 | 58 |
| 4 | 2013-01-15 | 0 | 25 | 2555 | 1099.00 | 1.0 | 15 | 1 | 2013 | 1 | 14 | 56 |
| 5 | 2013-01-10 | 0 | 25 | 2564 | 349.00 | 1.0 | 10 | 1 | 2013 | 3 | 9 | 59 |
| 6 | 2013-01-02 | 0 | 25 | 2565 | 549.00 | 1.0 | 2 | 1 | 2013 | 2 | 1 | 56 |
| 7 | 2013-01-04 | 0 | 25 | 2572 | 239.00 | 1.0 | 4 | 1 | 2013 | 4 | 3 | 55 |
| 8 | 2013-01-11 | 0 | 25 | 2572 | 299.00 | 1.0 | 11 | 1 | 2013 | 4 | 10 | 55 |
| 9 | 2013-01-03 | 0 | 25 | 2573 | 299.00 | 3.0 | 3 | 1 | 2013 | 3 | 2 | 55 |
Last rows
| date | date_block_num | shop_id | item_id | item_price | item_cnt_day | day | month | year | weekday | yrday | item_category_id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2935839 | 2015-10-24 | 33 | 25 | 7315 | 399.0 | 1.0 | 24 | 10 | 2015 | 5 | 296 | 55 |
| 2935840 | 2015-10-31 | 33 | 25 | 7409 | 299.0 | 1.0 | 31 | 10 | 2015 | 5 | 303 | 55 |
| 2935841 | 2015-10-11 | 33 | 25 | 7393 | 349.0 | 1.0 | 11 | 10 | 2015 | 6 | 283 | 55 |
| 2935842 | 2015-10-10 | 33 | 25 | 7384 | 749.0 | 1.0 | 10 | 10 | 2015 | 5 | 282 | 55 |
| 2935843 | 2015-10-09 | 33 | 25 | 7409 | 299.0 | 1.0 | 9 | 10 | 2015 | 4 | 281 | 55 |
| 2935844 | 2015-10-10 | 33 | 25 | 7409 | 299.0 | 1.0 | 10 | 10 | 2015 | 5 | 282 | 55 |
| 2935845 | 2015-10-09 | 33 | 25 | 7460 | 299.0 | 1.0 | 9 | 10 | 2015 | 4 | 281 | 55 |
| 2935846 | 2015-10-14 | 33 | 25 | 7459 | 349.0 | 1.0 | 14 | 10 | 2015 | 2 | 286 | 55 |
| 2935847 | 2015-10-22 | 33 | 25 | 7440 | 299.0 | 1.0 | 22 | 10 | 2015 | 3 | 294 | 57 |
| 2935848 | 2015-10-03 | 33 | 25 | 7460 | 299.0 | 1.0 | 3 | 10 | 2015 | 5 | 275 | 55 |
Most frequent
| date | date_block_num | shop_id | item_id | item_price | item_cnt_day | day | month | year | weekday | yrday | item_category_id | count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2013-01-05 | 0 | 54 | 20130 | 149.0 | 1.0 | 5 | 1 | 2013 | 5 | 4 | 40 | 2 |
| 1 | 2014-02-23 | 13 | 50 | 3423 | 999.0 | 1.0 | 23 | 2 | 2014 | 6 | 53 | 23 | 2 |
| 2 | 2014-03-23 | 14 | 21 | 3423 | 999.0 | 1.0 | 23 | 3 | 2014 | 6 | 81 | 23 | 2 |
| 3 | 2014-05-01 | 16 | 50 | 3423 | 999.0 | 1.0 | 1 | 5 | 2014 | 3 | 120 | 23 | 2 |
| 4 | 2014-07-12 | 18 | 25 | 3423 | 999.0 | 1.0 | 12 | 7 | 2014 | 5 | 192 | 23 | 2 |
| 5 | 2014-12-31 | 23 | 42 | 21619 | 499.0 | 1.0 | 31 | 12 | 2014 | 2 | 364 | 37 | 2 |